NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PEak: A Single Source of Truth for Hardware Design and Verification

https://doi.org/10.1145/3703456

Donovick, Caleb; Melchert, Jackson; Daly, Ross; Truong, Lenny; Raina, Priyanka; Hanrahan, Pat; Barrett, Clark (November 2024, ACM Transactions on Embedded Computing Systems)

Domain-specific languages for hardware can significantly enhance designer productivity, but sometimes at the cost of ease of verification. On the other hand, ISA specification languages are too static to be used during early stage design space exploration. We present PEak, an open-source hardware design and specification language, which aims to improve both design productivity and verification capability. PEak does this by providing a single source of truth for functional models, formal specifications, and RTL. PEak has been used in several academic projects, and PEak-generated RTL has been included in three fabricated hardware accelerators. In these projects, the formal capabilities of PEak were crucial for enabling both novel design space exploration techniques and automated compiler synthesis.
more » « less
Full Text Available
Cascade: An Application Pipelining Toolkit for Coarse-Grained Reconfigurable Arrays

https://doi.org/10.1109/TCAD.2024.3390542

Melchert, Jackson; Mei, Yuchen; Koul, Kalhan; Liu, Qiaoyi; Horowitz, Mark; Raina, Priyanka (October 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
Monolithic 3-D Integration of Diverse Memories: Resistive Switching (RRAM) and Gain Cell (GC) Memory Integrated on Si CMOS

https://doi.org/10.1109/TED.2025.3556113

Liu, Shuhan; Radway, Robert M; Wang, Xinxin; Moro, Filippo; Nodin, Jean-Francois; Jana, Koustav; Yan, Lixian; Du, Shuting; Upton, Luke R; Chen, Wei-Chen; et al (May 2025, IEEE Transactions on Electron Devices)

Free, publicly-accessible full text available May 1, 2026
EMBER: Efficient Multiple-Bits-Per-Cell Embedded RRAM Macro for High-Density Digital Storage

https://doi.org/10.1109/JSSC.2024.3387566

Levy, Akash; Upton, Luke R; Scott, Michael D; Rich, Dennis; Khwa, Win-San; Chih, Yu-Der; Chang, Meng-Fan; Mitra, Subhasish; Murmann, Boris; Raina, Priyanka (July 2024, IEEE Journal of Solid-State Circuits)

Full Text Available
Onyx: A 12nm 756 GOPS/W Coarse-Grained Reconfigurable Array for Accelerating Dense and Sparse Applications

https://doi.org/10.1109/VLSITechnologyandCir46783.2024.10631383

Koul, Kalhan; Strange, Maxwell; Melchert, Jackson; Carsello, Alex; Mei, Yuchen; Hsu, Olivia; Kong, Taeyoung; Chen, Po-Han; Ke, Huifeng; Zhang, Keyi; et al (June 2024, IEEE)

Full Text Available
EMBER: A 100 MHz, 0.86 mm2, Multiple-Bits-per-Cell RRAM Macro in 40 nm CMOS with Compact Peripherals and 1.0 pJ/bit Read Circuitry

https://doi.org/10.1109/ESSCIRC59616.2023.10268807

Upton, Luke R.; Levy, Akash; Scott, Michael D.; Rich, Dennis; Khwa, Win-San; Chih, Yu-Der; Chang, Meng-Fan; Mitra, Subhasish; Raina, Priyanka; Murmann, Boris (September 2023, ESSCIRC 2023- IEEE 49th European Solid State Circuits Conference (ESSCIRC))
Canal: A Flexible Interconnect Generator for Coarse-Grained Reconfigurable Arrays

https://doi.org/10.1109/LCA.2023.3268126

Melchert, Jackson; Zhang, Keyi; Mei, Yuchen; Horowitz, Mark; Torng, Christopher; Raina, Priyanka (January 2023, IEEE Computer Architecture Letters)

The architecture of a coarse-grained reconfigurable array (CGRA) interconnect has a significant effect on not only the flexibility of the resulting accelerator, but also its power, performance, and area. Design decisions that have complex trade-offs need to be explored to maintain efficiency and performance across a variety of evolving applications. This paper presents Canal, a Python-embedded domain-specific language (eDSL) and compiler for specifying and generating reconfigurable interconnects for CGRAs. Canal uses a graph-based intermediate representation (IR) that allows for easy hardware generation and tight integration with place and route tools. We evaluate Canal by constructing both a fully static interconnect and a hybrid interconnect with ready-valid signaling, and by conducting design space exploration of the interconnect architecture by modifying the switch box topology, the number of routing tracks, and the interconnect tile connections. Through the use of a graph-based IR for CGRA interconnects, the eDSL, and the interconnect generation system, Canal enables fast design space exploration and creation of CGRA interconnects.
more » « less
Full Text Available
Amber: A 16-nm System-on-Chip With a Coarse- Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra

https://doi.org/10.1109/JSSC.2023.3313116

Feng, Kathleen; Kong, Taeyoung; Koul, Kalhan; Melchert, Jackson; Carsello, Alex; Liu, Qiaoyi; Nyengele, Gedeon; Strange, Maxwell; Zhang, Keyi; Nayak, Ankita; et al (March 2024, IEEE Journal of Solid-State Circuits)

Amber is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for acceleration of dense linear algebra applications, such as machine learning (ML), image processing, and computer vision. It is designed using an agile accelerator-compiler co-design flow; the compiler updates automatically with hardware changes, enabling continuous application-level evaluation of the hardware-software system. To increase hardware utilization and minimize reconfigurability overhead, Amber features the following: 1) dynamic partial reconfiguration (DPR) of the CGRA for higher resource utilization by allowing fast switching between applications and partitioning resources between simultaneous applications; 2) streaming memory controllers supporting affine access patterns for efficient mapping of dense linear algebra; and 3) low-overhead transcendental and complex arithmetic operations. The physical design of Amber features a unique clock distribution method and timing methodology to efficiently layout its hierarchical and tile-based design. Amber achieves a peak energy efficiency of 538 INT16 GOPS/W and 483 BFloat16 GFLOPS/W. Compared with a CPU, a GPU, and a field-programmable gate array (FPGA), Amber has up to 3902x, 152x, and 107x better energy-delay product (EDP), respectively.
more » « less
Full Text Available
APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

https://doi.org/10.1145/3582016.3582070

Melchert, Jackson; Feng, Kathleen; Donovick, Caleb; Daly, Ross; Sharma, Ritvik; Barrett, Clark; Horowitz, Mark A.; Hanrahan, Pat; Raina, Priyanka (March 2023, ACM)

The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy-efficiency of an application running on the CGRA. This paper presents APEX, an automated approach for generating specialized PE architectures for an application or an application domain. APEX first analyzes application domain benchmarks using frequent subgraph mining to extract commonly occurring computational subgraphs. APEX then generates specialized PEs by merging subgraphs using a datapath graph merging algorithm. The merged datapath graphs are translated into a PE specification from which we automatically generate the PE hardware description in Verilog along with a compiler that maps applications to the PE. The PE hardware and compiler are inserted into a flexible CGRA generation and compilation toolchain that allows for agile evaluation of CGRAs. We evaluate APEX for two domains, machine learning and image processing. For image processing applications, our automatically generated CGRAs with specialized PEs achieve from 5% to 30% less area and from 22% to 46% less energy compared to a general-purpose CGRA. For machine learning applications, our automatically generated CGRAs consume 16% to 59% less energy and 22% to 39% less area than a general-purpose CGRA. This work paves the way for creation of application domain-driven design-space exploration frameworks that automatically generate efficient programmable accelerators, with a much lower design effort for both hardware and compiler generation.
more » « less
A compute-in-memory chip based on resistive random-access memory

https://doi.org/10.1038/s41586-022-04992-8

Wan, Weier; Kubendran, Rajkumar; Schaefer, Clemens; Eryilmaz, Sukru Burc; Zhang, Wenqiang; Wu, Dabin; Deiss, Stephen; Raina, Priyanka; Qian, He; Gao, Bin; et al (August 2022, Nature)

Abstract Realizing increasingly complex artificial intelligence (AI) functionalities directly on edge devices calls for unprecedented energy efficiency of edge hardware. Compute-in-memory (CIM) based on resistive random-access memory (RRAM) 1 promises to meet such demand by storing AI model weights in dense, analogue and non-volatile RRAM devices, and by performing AI computation directly within RRAM, thus eliminating power-hungry data movement between separate compute and memory 2–5 . Although recent studies have demonstrated in-memory matrix-vector multiplication on fully integrated RRAM-CIM hardware 6–17 , it remains a goal for a RRAM-CIM chip to simultaneously deliver high energy efficiency, versatility to support diverse models and software-comparable accuracy. Although efficiency, versatility and accuracy are all indispensable for broad adoption of the technology, the inter-related trade-offs among them cannot be addressed by isolated improvements on any single abstraction level of the design. Here, by co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM—a RRAM-based CIM chip that simultaneously delivers versatility in reconfiguring CIM cores for diverse model architectures, energy efficiency that is two-times better than previous state-of-the-art RRAM-CIM chips across various computational bit-precisions, and inference accuracy comparable to software models quantized to four-bit weights across various AI tasks, including accuracy of 99.0 percent on MNIST 18 and 85.7 percent on CIFAR-10 19 image classification, 84.7-percent accuracy on Google speech command recognition 20 , and a 70-percent reduction in image-reconstruction error on a Bayesian image-recovery task.
more » « less
Full Text Available

Search for: All records